Use dependency injection for runner #10326

larryliu0820 · 2025-04-21T18:40:26Z

Summary:
Pass in runner components, move most of the instantiation logic from load() to a new static API create().

This adds testability to runner components.

Differential Revision: D73165546

pytorch-bot · 2025-04-21T18:40:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10326

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 85d1e5b with merge base 1b063ca ():

NEW FAILURES - The following jobs have failed:

trunk / test-arm-backend (test_pytest_ops_ethosu_fvp) / linux-job (gh)
RuntimeError: Command docker exec -t 81cdb9da2b31e89153779722b9adfd189b636d48d2bd719f3304f38f1e236333 /exec failed with exit code 1
trunk / test-models-macos (efficient_sam, portable) / macos-job (gh)
The process '/usr/bin/git' failed with exit code 128

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-04-21T18:40:38Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: X-link: pytorch/executorch#10326 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

Summary: X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

facebook-github-bot · 2025-04-21T19:12:31Z

This pull request was exported from Phabricator. Differential Revision: D73165546

lucylq · 2025-04-22T20:45:23Z

examples/models/llama/runner/runner.h

 public:
-  explicit Runner(
+  // Static factory method to create a Runner instance
+  static std::unique_ptr<Runner> create(


Will most users interact with runner via this API?

This is for allowing existing call sites to still get a Runner instance by passing in these strings.

See comment below, in the end I want to build some logic that takes in the model type and tokenizer type, and be able to get different runners.

swolchok

I am generally opposed to both tests that use mocks and dependency injection.

I am generally opposed to mocks because they tend to produce change detector tests. For the case at hand, why not create a Runner and see if it decodes the right number of tokens?

I am generally opposed to dependency injection because pass-from-above is typically all you need in order to accomplish the usual stated goals. In this particular diff, I don't immediately understand what Runner::create is accomplishing; I don't see any sort of flag that causes it to return a different Runner in a test environment, nor do the tests use it.

swolchok · 2025-04-22T21:27:19Z

examples/models/llama/runner/runner.cpp

 } // namespace

-Runner::Runner(
+std::unique_ptr<Runner> Runner::create(


Specifically, it looks like create() could just be another Runner constructor; it's not clear to me why it has to be a static method instead.

swolchok · 2025-04-22T21:28:21Z

examples/models/llama/runner/runner.h

-  [[deprecated(
-      "This constructor is deprecated. Use the constructor without temperature parameter instead.")]]
+  // Constructor with dependency injection
  explicit Runner(


looks like I was confused in my original comment. This form of dependency injection (pass from above) is fine; I thought Runner::create was going to be part of a trend of giving everything a create method and therefore building our own dependency injection framework.

yeah - create() is merely a util to replace the old constructor which takes in all the strings and instantiate components.

From a high level I want to eventually build something like this:

Given a specific model (Llama or Qwen), there's a logic to determine the architecture (could be hardcoded and listed in a yaml file or whatever), in Llama and Qwen's case we know the model type is decoder only text only LLM, therefore we need TextTokenGenerator and TextPrefiller for it. If we see a Llava model we can instantiate a ImagePrefiller.

Then we can also look at the tokenizer artifacts being passed in from create() and tell that this is a HF tokenizer or a Tiktoken tokenizer or anything else, then instantiate the actual object.

Eventually we pass the TextTokenGenerator, TextPrefiller and Tokenizer objects to the Runner and it should just work.

Summary: X-link: pytorch/executorch#10326 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

Summary: X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

facebook-github-bot · 2025-04-22T23:10:54Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

Summary: X-link: pytorch/executorch#10326 Pull Request resolved: #53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Differential Revision: D73165546

larryliu0820 · 2025-04-30T23:54:44Z

I am generally opposed to both tests that use mocks and dependency injection.

I am generally opposed to mocks because they tend to produce change detector tests. For the case at hand, why not create a Runner and see if it decodes the right number of tokens?

I am generally opposed to dependency injection because pass-from-above is typically all you need in order to accomplish the usual stated goals. In this particular diff, I don't immediately understand what Runner::create is accomplishing; I don't see any sort of flag that causes it to return a different Runner in a test environment, nor do the tests use it.

I generally agree that we should not write internal change detectors. Though the tests I'm adding is slightly different, I'm not guarding the internal logic of the runner but rather focus on:

If we pass in a seq_len into generate() will we generate exact number of tokens.
If the runner sees a EOS token can it stop generate next token.

I didn't test the implementation details, but just want to make sure the API behavior is expected.

why not create a Runner and see if it decodes the right number of tokens?

It's hard to test with real model because 1. it's slow, 2. it's hard to see if we generate the exact number of tokens because we are getting string not tokens.

Hope that makes sense.

Summary: X-link: pytorch/executorch#10326 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Differential Revision: D73165546

Summary: X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Differential Revision: D73165546

facebook-github-bot · 2025-05-20T22:03:50Z

This pull request was exported from Phabricator. Differential Revision: D73165546

facebook-github-bot · 2025-05-20T22:22:47Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-20T22:32:16Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T17:21:14Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T17:57:07Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T18:08:52Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T18:29:46Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T18:35:35Z

This pull request was exported from Phabricator. Differential Revision: D73165546

Summary: Pull Request resolved: #10326 X-link: meta-pytorch/tokenizers#53 Pass in runner components, move most of the instantiation logic from `load()` to a new static API `create()`. This adds testability to runner components. Next step would be moving most of the logic out into `extension/llm/runner/` so that it can be used on non-llama models. Currently the logic for getting tokenizer instance should not assume llama, which I can modify in next diff. Reviewed By: kirklandsign, iseeyuan Differential Revision: D73165546

facebook-github-bot · 2025-05-21T20:21:17Z

This pull request was exported from Phabricator. Differential Revision: D73165546

larryliu0820 requested review from jackzhxng, iseeyuan, swolchok, kirklandsign, lucylq and shoumikhin as code owners April 21, 2025 18:40

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 21, 2025

facebook-github-bot added the fb-exported label Apr 21, 2025

facebook-github-bot force-pushed the export-D73165546 branch from 1fe0afa to 16e3256 Compare April 21, 2025 19:12

lucylq reviewed Apr 22, 2025

View reviewed changes

swolchok reviewed Apr 22, 2025

View reviewed changes

larryliu0820 force-pushed the export-D73165546 branch from 16e3256 to 91c0f0a Compare April 22, 2025 23:07

larryliu0820 force-pushed the export-D73165546 branch from 91c0f0a to c4bf4be Compare April 22, 2025 23:11

larryliu0820 mentioned this pull request Apr 30, 2025

Use external hf_tokenizer in llama runner #9112

Merged

facebook-github-bot force-pushed the export-D73165546 branch from c4bf4be to 869a555 Compare May 19, 2025 19:54

larryliu0820 force-pushed the export-D73165546 branch from b8f980f to 4d84fae Compare May 20, 2025 22:22

iseeyuan approved these changes May 20, 2025

View reviewed changes

larryliu0820 force-pushed the export-D73165546 branch from 4d84fae to c563b82 Compare May 20, 2025 22:32

larryliu0820 added the ciflow/trunk label May 20, 2025

larryliu0820 force-pushed the export-D73165546 branch from c563b82 to db324db Compare May 21, 2025 17:21

larryliu0820 force-pushed the export-D73165546 branch from db324db to ebfa441 Compare May 21, 2025 17:57

larryliu0820 force-pushed the export-D73165546 branch from ebfa441 to 17ed682 Compare May 21, 2025 18:08

larryliu0820 force-pushed the export-D73165546 branch from 17ed682 to 8fc5cba Compare May 21, 2025 18:29

larryliu0820 force-pushed the export-D73165546 branch from 8fc5cba to 01d09f2 Compare May 21, 2025 18:35

larryliu0820 force-pushed the export-D73165546 branch from 01d09f2 to 85d1e5b Compare May 21, 2025 20:21

larryliu0820 requested a review from tarun292 as a code owner May 21, 2025 20:21

facebook-github-bot merged commit 3f19793 into main May 22, 2025
279 of 286 checks passed

facebook-github-bot deleted the export-D73165546 branch May 22, 2025 00:47

Use dependency injection for runner #10326

Use dependency injection for runner #10326

Uh oh!

Conversation

larryliu0820 commented Apr 21, 2025

Uh oh!

pytorch-bot bot commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10326

❌ 2 New Failures

Uh oh!

facebook-github-bot commented Apr 21, 2025

Uh oh!

facebook-github-bot commented Apr 21, 2025

Uh oh!

lucylq Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok left a comment

Choose a reason for hiding this comment

Uh oh!

swolchok Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

swolchok Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

larryliu0820 Apr 30, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 22, 2025

Uh oh!

larryliu0820 commented Apr 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 20, 2025

Uh oh!

facebook-github-bot commented May 20, 2025

Uh oh!

facebook-github-bot commented May 20, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

facebook-github-bot commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 21, 2025 •

edited

Loading

lucylq Apr 22, 2025 •

edited

Loading

larryliu0820 commented Apr 30, 2025 •

edited

Loading